Method of detecting a useful signal affected by noise

ABSTRACT

In order to detect a useful signal affected by noise, a measurement is taken of the expected S/N ratio of this signal over a time slice, a measurement of the estimated white noise alone is taken over another time slice without useful signal, the mean energy of the noise and of the noise-affected signal is calculated, in each of their time slices, the theoretical detection threshold is calculated, the ratio of these two energies is calculated, and the ratio is compared with the calculated threshold, this threshold being greater than 1 (ideal threshold).

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a method of detecting a useful signal affected by noise.

2. Discussion of the Background

One of the great problems in signal processing, simple to enunciate but very complex to resolve, consists in determining the presence or the absence of a useful signal buried in additive noise.

Various solutions can be envisaged. It is possible to use, as a variable, the instantaneous amplitude of the received or processed signal by reference to an experimentally-determined threshold.

It is also possible to use, as a variable, the energy of the total signal over a time slice of duration T, by thresholding this energy, still experimentally.

These thresholdings allow a first assumption on the presence or the absence of the signal. They are, moreover, applicable to any signal. Hence, they are complemented by "confirmation" systems, defining "near-certain" criteria, specific to the type of useful signal, when the nature of the latter is known in advance.

Such a complementary system is widely used in speech processing and may consist, for example, in extraction of "pitch" or in evaluation of the minimum energy of a vowel.

SUMMARY OF THE INVENTION

The subject of the present invention is a method of detecting a useful signal affected by noise, determining the detection threshold as rigourousiy as possible, and able to operate self-adaptively.

According to the invention, the expected signal/noise ratio of the signal to be processed is available, and a measurement of the estimated noise alone is available, a measurement enumerated over M points, this noise being white or made to be white, the mean energy of the noise over these M points is calculated, a slice of N points of noise-affected signal is taken, the mean energy of these N points is calculated, the theoretical detection threshold is calculated, the ratio between the two said mean energies is calculated, and this ratio is compared with the said threshold.

BRIEF DESCRIPTION OF THE DRAWINGS

A more complete appreciation of the invention and many of the attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings, wherein:

FIG. 1 illustrates an exemplary embodiment of the invention; and

FIG. 2 illustrates a process used by the present invention; and

FIG. 3 illustrates a second embodiment of the present invention.

THE DESCRIPTION OF THE PREFERRED EMBODIMENTS

Referring now to the drawings, wherein like reference numerals designate identical or corresponding parts throughout the several view, and more particularly to FIG. 1 thereof, there is illustrated an exemplary hardware embodiment of the present invention. Reference No. 6 designates a speech detector which detects if speech is contained in an input audio sample. The algorithm used by the speech detector 6 requires a measurement of noise alone 2 and a signal which may or may not contain speech. Speech files 4 contain the audio samples/signals which may or may not contain speech. The audio samples can be mixed with noise. The speech detector 6 contains a segmentation coarseness changeover switch 8 which determines if diction of the speech files is to be segmented in a coarse manner.

FIG. 2 illustrates a process which is performed by the present invention. First, in step 10, a signal which is noise alone is measured. In step 12, a combined speech and noise signal is measured. Step 14 then calculates a ratio of the energy of the combined speech and noise signal to the energy of the measured noise signal. Step 16 then calculates a detection threshold which is described in detail below, and step 18 compares the ratio calculated in step 14 with the detection threshold calculated in step 16.

Step 20 then determines if speech is present using the comparison result of step 18. If there is no speech present, the process ends. If step 20 indicates speech is present, flow proceeds to step 22 which outputs a speech detection signal.

FIG. 3 illustrates a second embodiment of the invention. The speech detector 6 and segmentation coarseness changeover switch 8 in FIG. 3 operate in a similar manner as elements 6 and 8 illustrated in FIG. 1. The reference No. 2 designates a conventional sound detector which has input thereto both speech and noise signals. The output from the sound detector 2 is connected to the speech files 4 which can store the detected sound for later processing. When the speech detector 6 detects a useful signal such as speech, it outputs a signal indicating speech has been detected.

First of all, it will be explained how, in the ideal case, detection of a signal affected by noise should theoretically be done.

A first item of information u(n) is available for a first time slice such that:

    u(n)=s(n)+x(n)

n being a whole number: 0≦n≦N-1, s(n) being a useful signal and x(n) noise. Moreover, another item of information y(n) is available, with 0≦n≦M-1, and M possibly being equal to or different from N. y(n) is a measure of the noise x(n) over another time slice devoid of useful signal. ##EQU1##

Hence, in an ideal and unrealistic case, this would give, with SNR=signal-to-noise ratio:

    Z=1+SNR

and the simple detection criterion would be:

    Z>1: presence of useful signal

    Z<1: absence of useful signal

According to the present invention, the theoretical threshold of 1 is replaced by a threshold μ, calculated as explained below, which takes account of the fact that the signals available are not perfectly ergodic and that U and V are only estimates of the true value of the variances σ_(u) 2 and σ_(x) 2.

In order to calculate μ, the following method is used.

Starting with the fact that the variables U and V are random in nature, and that consequently Z also is, then the probability density of Z (which depends on the signal-to-noise ratio) is calculated.

It is then a question, by invoking the principle of maximum likelihood, of determining the best estimate of the signal-to-noise ratio after having calculated the variable Z.

To this end, the abovementioned variable U(n) is measured over one time slice, and the variable y(n) is measured over another time slice in which it is certain that there is no useful signal, but only noise (independent of and decorrelated with s(n)).

In order to determine the density of the random variable Z (which may be described as observed variable), the following method is used. Let X₁ belonging to N (m₁ ; σ₁ ²) and X₂ belonging to N (m₂ ; σ₂ ²) be two independent gaussian random variables for which the probabilities P_(r) {X₁ <0} and P_(r) {X₂ <0} are practically zero.

Then: m=m₁ /m₂, σ² =σ₁ ² /σ₂ ², α=m₂ /σ₂.

The probability density f_(x) (x) of X is then: ##EQU2## where U(x)=1 if x≧0 and U(x)=0 if x<0. ##EQU3## then: P(x)=P_(r) {X<x}=F [h(x)], an expression in which F(x) designates the characteristic function of the normalised gaussian variable.

Supposing now that the signals s(n), x(n) and y(n) are white, gaussian and centred. ##EQU4##

This latter term is, therefore, itself also white, gaussian and centred; ##EQU5##

Since σ_(s) ² and σ_(x) ² are defined, it is assumed implicitly that calculation of the probability density is done with known σ_(s) ² and σ_(x) ². Thus the density of Z is evaluated knowing σ_(s) ² and σ_(x) ². In this case, U and V follow the chi-2 (sic) laws, and, for sufficiently large N and M, U and V are approximated by gaussian laws which are practically always positive: ##EQU6## Z is therefore the ratio of two independent gaussian variables. It can easily be demonstrated that U and V are independent. ##EQU7##

The probabililty density of Z, knowing σ_(s) ² and σ_(x) ², is hence expressed by: ##EQU8## Setting: ##EQU9## such that: f_(z) (z:σ_(s) ², σ_(x) ²)=f_(k),M (z,σ_(s) ² /σ_(x) ²)

According to the results above relating to the probability density of Z, the probability is deduced. ##EQU10## This gives: P_(r) {Z<z: σ_(s) ² ; σ_(x) ² }=F{h_(k),M (x,r)}.

The case of any signal s(n) and a gaussian white noise will now be examined.

Still assuming that the noises x(n) and y(n) are white, gaussian with σ_(x) ² =E[x(n)² ]=E[y(n)² ]. The useful signal s(n) is assumed to be any signal whatever, independent of the noise.

The new hypothesis used here is to assume that s(n) and x(n) are not correlated in the time sense of the term, that is to say that: ##EQU11##

In the same way as before, the calculation of the density of Z was done while knowing σ_(s) ² and σ_(s) ², here the calculation will be performed while knowing μ_(s) ² and σ_(x) ². The density to be calculated will be denoted by f_(z) (z:μ_(s) ², σ_(x) ²).

Knowing μ_(s) ², U=μ_(s) ² +(1/N) Σ 0≦n≦N-1 x(n)² belongs to N(μ_(s) ² +σ_(x) ² ; (2/N) σ_(x) ⁴). V belongs to N(σ_(x) ² ; (2/M) σ_(x) ⁴).

Z=U/V is thus approximated by the ratio of two independent gaussian laws. As U and V are independent, the result relating to the probability density of X is applied, with: ##EQU12##

The probability density of Z, knowing μ_(s) ² and σ_(x) ² is then equal to: ##EQU13## such that: f_(z) (z:σ_(s) ², σ_(x) ²)=f_(k),M (z, σ_(s) ² /σ_(x) ²)

According to the results above relating to the probability density of X, the probability is deduced therefrom ##EQU14## This gives: P_(r) {Z<z: μ_(s), σ_(x) ² }=F (h_(k),M (x,r))

According to the present invention, activity detection is implemented by having recourse to the likelihood maximum.

In the case of processed signals, the probability density of the variable Z, knowing the energies of the useful signal and of the noise, is expressed by a function of the form: f_(k),M (z,r) where r designates the signal-to-noise ratio. This probability therefore depends on the signal-to-noise ratio. In addition, the decision rule can only be given with expected signal-to-noise ratio. Therefore let r₀ by this expected signal-to-noise ratio.

Assume that the probability of absence of s(n) is π₀ and that the probability of presence of s(n) is π₁.

Since the probability density f_(k),(z,r) is known, the optimum decision rule is given by the general theory of detection and is expressed by: ##EQU15##

It is also possible to express this decision rule in the form: (Z<μ→D=0) and (Z>μ→D=1).

It is then necessary to determine μ and solve the equation:

    1n[f.sub.k,M (z,r.sub.0)]-1n[f.sub.k,(z,0)]-1n(π.sub.0,π.sub.1)=0.

It is then shown that the error probability is equal to:

    Pe=π.sub.0 [1-F(h.sub.k,M (μ,0))]+π.sub.1 F(h.sub.k,M (μ,r.sub.0)).

The case of the detection of a gaussian white signal in noise which is itself gaussian and white will now be examined.

The signals s(n), x(n) and y(n) are assumed to be white, gaussian and centered. Let r₀ be the expected signal-to-noise ratio, and k=M/N. The probability of absence of s(n) is π₀ and the probability of presence of s(n) is π₁.

The decision rule is then: ##EQU16##

The threshold being determined for equality (instead of inequality) between the terms of these two expressions.

It is also possible to express this decision rule in the form: (Z<μ→D=0) and (Z>μ→D=1). For μ, with M=N=128, π₀ =π₁ =1/2 there is obtained, for example:

    ______________________________________                                                 r.sub.0 in dB                                                                         μ                                                            ______________________________________                                                 -2     1.27                                                                    -1     1.34                                                                    0      1.41                                                                    1      1.50                                                                    2      1.68                                                            ______________________________________                                    

The error probability is: Pe=π₀ [1-F(h_(k),M (μ,0))]+π₁ F(h_(k),M (μ,r0))

with: ##EQU17##

Below are given a few values of Pe as a function of r₀. π₀ and π₁ are taken to be equal to 0.5.

    ______________________________________                                                 r.sub.0 in dB                                                                         Pe                                                              ______________________________________                                                 -2     0.086                                                                   -1     0.052                                                                   0      0.028                                                                   1      0.013                                                                   2      0.005                                                           ______________________________________                                    

In one simulation example, gaussian white noise with unity variance was generated. For each frame of 128 points (N=M=128), it was decided at random to generate additive noise s(n), exhibiting a signal-to-noise ratio defined in advance. The appearance and absence probabilities (π₀ and π₁) are equal to 0.5. A second gaussian white noise with unity variance was generated, which served for calculating the random variable V. Z was calculated for each frame. Then the decision rule was applied and the number of errors was counted.

    ______________________________________                                                      Number of errors                                                  r.sub.0 in dB                                                                               over 1000 iterations                                              ______________________________________                                         -2           73                                                                -1           43                                                                0            18                                                                1            10                                                                2             2                                                                ______________________________________                                    

These results corroborate those anticipated from the theoretical calculation.

The case of any signal s(n) and a gaussian white noise will now be examined.

It is still assumed that the noises x(n) and y(n) are white, gaussian with σ_(x) ² =E[x(n)² ]=E[y(n)² ]. The useful signal s(n) is assumed to be any signal whatever, independent of the noise. Let r₀ be the signal-to-noise ratio expected, k=M/N. The probability of absence of s(n) is σ₀ and that of presence of s(n) is π₁.

The decision rule then is: ##EQU18##

It is also possible to express this decision rule in the form: (Z<μ→D=0) and (Z>μ→D=1).

For μ the following values are obtained as a function of r₀, for M=N=128, π₀ =π₁ =1/2.

    ______________________________________                                                 r.sub.0 in dB                                                                         μ                                                            ______________________________________                                                 -2     1.30                                                                    -1     1.38                                                                    0      1.48                                                                    1      1.60                                                                    2      1.76                                                            ______________________________________                                    

Then, moreover: ##EQU19##

Several values of Pe as a function of r₀ are given below. The probabilities π₀ and π₁ are taken to be equal to 0.5.

    ______________________________________                                                 r.sub.0 in dB                                                                         Pe                                                              ______________________________________                                                 -2     0.062                                                                   -1     0.032                                                                   0      0.013                                                                   1      0.004                                                                   2      0.001                                                           ______________________________________                                    

In one simulation example, for each frame of 128 points of white noise generated (N=M=128), it was decided at random to add s(n) to it, which, here, is a sinusoid, exhibiting a signal-to-noise ratio defined in advance. π₁ and π₀ are taken to be equal to 0.5.

A second white noise with unity variance was generated, serving to calculate V. For each frame, Z was calculated and the abovementioned decision rule was applied. The number of errors was counted.

The following results were obtained:

    ______________________________________                                                      Number of errors                                                  r.sub.0 in dB                                                                               over 1000 iterations                                              ______________________________________                                         -2           70                                                                -1           37                                                                0            12                                                                1             6                                                                2             3                                                                ______________________________________                                    

These results corroborate those anticipated from the theoretical calculation.

The preceding results, being very general, allow the detection of signals buried in additive noise, even when the signal-to-noise ratio is low, close to 0 dB.

An application will be described below, in which this type of detection may be seen to be very useful.

The algorithms presented apply to the case of speech, as a pre-system for detection of vocal activity.

The choice of the detection threshold depends on the context.

As far as the audio bands used are concerned, a preliminary characterisation of noise and speech, with the aid of measurements based on estimation by maximum likelihood, shows that the vocal signal to be detected exhibits a signal-to-noise ratio of at least 6 dB.

Moreover, the processing system uses signal frames of 128 points, the sampling frequency being 10 kHz.

The variables U and V are both evaluated over 128 points, such that M=N=128.

According to the foregoing, the theoretical detection threshold is deduced at 3.

However, it is impossible to be restricted to this single threshold. In fact, if the noise is relatively stationary, it exhibits non-stationary features to be taken into account in order to renew the variable V, which makes it possible to make the algorithm partially adaptive.

Hence a second threshold is introduced, which makes it possible to decide whether the variable V will be renewed or not.

This second threshold is chosen to be 1.25, which corresponds to a noise which adds to the stationary noise exhibiting a signal-to-noise ratio of -2 dB.

The decision rule is then:

If Z<1.25

The processed frame then consists of the same noise as that used as reference. The variable V is replaced by the value of the energy of the processed frame.

It will be noted that, since the decision is to consider the processed frame as representative noise, it would be possible to renew the variable V by forming the mean of the former value of V and of the energy of the frame in question. This leads to changing the value of M (number of points over which V is evaluated) but this operation may induce incorrect operation of the algorithm.

If 1.25<Z<3

The frame is considered as containing non-stationary noise, and devoid of speech.

If 3<Z

The frame is considered to be speech.

Tests carried out on samples of signals affected by noise have validated this detection.

However, it is recalled that this vocal detection may be improved by use of criteria specific to the speech signal, such as the calculation of "pitch".

The algorithm proposed here concerns the investigation of several examples of signals. It is obvious that for other speech signals exhibiting different signal-to-noise ratios, a new choice of threshold is necessary.

The use of two thresholds is generally preferable.

One application of this algorithm makes it possible to create correct reference files for the voice recognition system in question. Precise segmentation of diction is then necessary.

In one application, a changeover microswitch (microswitch opening and closing) which delivers coarse segmentation of diction.

The preceding algorithm has been used to refine this changeover switch. A first pass of the algorithm made it possible to specify the start of the dictions. A second pass consisted in reading the speech file "backwards", that is to say starting with the microswitch closure towards microswitch opening. This also made it possible to specify the end of the diction.

This non-causal use of the algorithm is necessary, as activity detection sufficiently precise to detect, inside words, the presence of silences, which is prejudicial to implementing segmentation for the learning phases.

The same type of application also makes at possible to segment the speech files on which recognition is carried out.

However, this algorithm is obviously not causal, which is prejudicial for real-time use. Hence the necessity of completing this algorithm by a calculation specific to speech processing.

We have demonstrated the existence of optimal detection thresholds, which makes it possible to have a theoretical approach to the problem of estimating the signal-to-noise ratio and, above all of detection, in the case of white noise and a signal which is known only from its energy over N points when the latter remains relatively stationary. 

I claim:
 1. A method for detecting if speech is present in an audio sample, comprising the steps of:detecting noise and generating a noise signal; detecting an audio sample which includes both speech and noise and generating an audio signal; determining an energy of the noise signal; determining an energy of the audio signal; calculating a ratio of the energy of the audio signal to the energy of the noise signal; calculating a detection threshold; and comparing the calculated ratio with the calculated detection threshold and outputting a comparison result which indicates one of a presence and absence of speech in the audio sample.
 2. A method according to claim 1, further comprising the step of:calculating a second detection threshold; wherein said comparing step comprises the substeps of: comparing the calculated ratio with the first calculated detection threshold and outputting a first comparison result; and comparing the calculated ratio with the second calculated detection threshold and outputting a second comparison result; and wherein said outputting of the comparison result outputs the comparison result using both said first and second comparison results.
 3. A method according to claim 1, further comprising the steps of:determining if said noise signal is a white noise signal; and converting said noise signal to a noise signal containing white noise, when said step of determining if said noise signal is a white noise signal determines that said noise signal is not a white noise signal.
 4. A method according to claim 1, wherein:said step of determining the energy of the noise signal determines the energy of the noise signal over N sampling slices; and said step of determining the energy of the audio signal determines the energy of the audio signal over M sampling slices.
 5. A method according to claim 4, wherein:the step of calculating the detection threshold calculates the detection threshold for: ##EQU20## where r₀ is an expected signal to noise ratio, K=M/N, π₀ is a probability of an absence of the useful signal, and π₁ is a probability of a presence of the useful signal.
 6. A method according to claim 4, wherein:the step of calculating the detection threshold calculates the detection threshold for: ##EQU21## where r₀ is an expected signal to noise ratio, K=M/N, π₀ is a probability of an absence of the useful signal, and π₁ is a probability of a presence of the useful signal.
 7. An apparatus for detecting if speech is present in an audio sample, comprising:first energy determination means for determining an energy of a measured noise signal; a speech file for storing an audio sample which includes both speech and noise; second energy determination means, connected to the speech file for determining an energy of the stored audio sample; first calculating means for calculating a ratio of the energy of the stored audio sample to an energy of the noise signal, connected to the first and second energy determination means; second calculating means for calculating a detection threshold; and means for comparing the calculated ratio with the calculated detection threshold and outputting a comparison result which indicates one of a presence and absence of speech in the audio sample, connected to the first and second calculating means.
 8. An apparatus according to claim 7, further comprising:means for calculating a second detection threshold, connected to said comparing means; wherein said comparing means comprises: means for comparing the calculated ratio with the first calculated detection threshold and outputting a first comparison result; and means for comparing the calculated ratio with the second calculated detection threshold and outputting a second comparison result; and wherein said outputting of the comparison result by the comparing means outputs the comparison result using both said first and second comparison results.
 9. An apparatus according to claim 7, further comprising:white noise determination means for determining if said noise signal is a white noise signal, connected to the first energy determination means; conversion means, connected to said white noise determination means and said first energy detection means, for converting said noise signal to a noise signal containing white noise.
 10. An apparatus according to claim 7, wherein:said first energy determination means determines the energy of the noise signal over N sampling slices; and said second energy determination means determines the energy of the audio signal over M sampling slices.
 11. An apparatus according to claim 10, wherein:the means for calculating the detection threshold calculates the detection threshold for: ##EQU22## where r₀ is an expected signal to noise ratio, k=M/N, π₀ is a probability of an absence of the useful signal, and π₁ is a probability of a presence of the useful signal.
 12. An apparatus according to claim 10, wherein:the means for calculating the detection threshold calculates the detection threshold for: ##EQU23## where r₀ is an expected signal to noise ratio, k=M/N, π₀ is a probability of an absence of the useful signal, and π₁ is a probability of a presence of the useful signal.
 13. An apparatus according to claim 7, further comprising:a segmentation means, connected to the speech file, for segmenting diction contained in the speech file; and a switch connected to the segmentation means; wherein a coarseness of segmentation performed by the segmentation means is determined using a setting of said switch.
 14. An apparatus for detecting if speech is present in an audio sample, comprising:first energy determination means for determining an energy of a measured noise signal; a sound detector; second energy determination means, connected to the sound detector, for determining an energy an audio sample containing noise and speech detected by the sound detector; first calculating means, connected to the first and second energy determination means, for calculating a ratio of the energy of the audio sample to an energy of the noise signal, connected to the first and second energy determination means; second calculating means for calculating a detection threshold; and means for comparing the calculated ratio with the calculated detection threshold and outputting a comparison result which indicates one of a presence and absence of speech in the audio sample, connected to the first and second calculating means.
 15. An apparatus according to claim 14, further comprising:means for calculating a second detection threshold; wherein said comparing means comprises: means for comparing the calculated ratio with the first calculated detection threshold and outputting a first comparison result; and means for comparing the calculated ratio with the second calculated detection threshold and outputting a second comparison result; and wherein said outputting of the comparison result by the comparing means outputs the comparison result using both said first and second comparison results.
 16. An apparatus according to claim 14, further comprising:white noise determination means for determining if said noise signal is a white noise signal, connected to the first energy determination means; conversion means, connected to said white noise determination means and said first energy detection means, for converting said noise signal to a noise signal containing white noise.
 17. An apparatus according to claim 14, wherein:said first energy determination means determines the energy of the noise signal over N sampling slices; and said second energy determination means determines the energy of the audio signal over M sampling slices.
 18. An apparatus according to claim 17, wherein:the means for calculating the detection threshold calculates the detection threshold for: ##EQU24## where r₀ is an expected signal to noise ratio, k=M/N, π₀ is a probability of an absence of the useful signal, and π₁ is a probability of a presence of the useful signal.
 19. An apparatus according to claim 17, wherein:the means for calculating the detection threshold calculates the detection threshold for: ##EQU25## where r₀ is an expected signal to noise ratio, k=M/N, π₀ is a probability of an absence of the useful signal, and π₁ is a probability of a presence of the useful signal. 